755 research outputs found
Transfer Learning With Efficient Estimators to Optimally Leverage Historical Data in Analysis of Randomized Trials
Randomized controlled trials (RCTs) are a cornerstone of comparative
effectiveness because they remove the confounding bias present in observational
studies. However, RCTs are typically much smaller than observational studies
because of financial and ethical considerations. Therefore it is of great
interest to be able to incorporate plentiful observational data into the
analysis of smaller RCTs. Previous estimators developed for this purpose rely
on unrealistic additional assumptions without which the added data can bias the
effect estimate. Recent work proposed an alternative method (prognostic
adjustment) that imposes no additional assumption and increases efficiency in
the analysis of RCTs. The idea is to use the observational data to learn a
prognostic model: a regression of the outcome onto the covariates. The
predictions from this model, generated from the RCT subjects' baseline
variables, are used as a covariate in a linear model. In this work, we extend
this framework to work when conducting inference with nonparametric efficient
estimators in trial analysis. Using simulations, we find that this approach
provides greater power (i.e., smaller standard errors) than without prognostic
adjustment, especially when the trial is small. We also find that the method is
robust to observed or unobserved shifts between the observational and trial
populations and does not introduce bias. Lastly, we showcase this estimator
leveraging real-world historical data on a randomized blood transfusion study
of trauma patients.Comment: 12 pages, 3 figure
Cross-Validated Decision Trees with Targeted Maximum Likelihood Estimation for Nonparametric Causal Mixtures Analysis
Exposure to mixtures of chemicals, such as drugs, pollutants, and nutrients,
is common in real-world exposure or treatment scenarios. To understand the
impact of these exposures on health outcomes, an interpretable and important
approach is to estimate the causal effect of exposure regions that are most
associated with a health outcome. This requires a statistical estimator that
can identify these exposure regions and provide an unbiased estimate of a
causal target parameter given the region. In this work, we present a
methodology that uses decision trees to data-adaptively determine exposure
regions and employs cross-validated targeted maximum likelihood estimation to
unbiasedly estimate the average regional-exposure effect (ARE). This results in
a plug-in estimator with an asymptotically normal distribution and minimum
variance, from which confidence intervals can be derived. The methodology is
implemented in the open-source software, CVtreeMLE, a package in R. Analysts
put in a vector of exposures, covariates and an outcome and tables are given
for regions in the exposures, such as lead > 2.1 & arsenic > 1.4, with an
associated ARE which represents the mean outcome difference if all individuals
were exposed to this region compared to if none were exposed to this region.
CVtreeMLE enables researchers to discover interpretable exposure regions in
mixed exposure scenarios and provides robust statistical inference for the
impact of these regions. The resulting quantities offer interpretable
thresholds that can inform public health policies, such as pollutant
regulations, or aid in medical decision-making, such as identifying the most
effective drug combinations
Discovering Patient Phenotypes Using Generalized Low Rank Models
The practice of medicine is predicated on discovering commonalities or distinguishing characteristics among patients
to inform corresponding treatment. Given a patient grouping (hereafter referred to as a p henotype ), clinicians can
implement a treatment pathway accounting for the underlying cause of disease in that phenotype. Traditionally,
phenotypes have been discovered by intuition, experience in practice, and advancements in basic science, but these
approaches are often heuristic, labor intensive, and can take decades to produce actionable knowledge. Although our
understanding of disease has progressed substantially in the past century, there are still important domains in which
our phenotypes are murky, such as in behavioral health or in hospital settings. To accelerate phenotype discovery,
researchers have used machine learning to find patterns in electronic health records, but have often been thwarted by
missing data, sparsity, and data heterogeneity. In this study, we use a flexible framework called Generalized Low
Rank Modeling (GLRM) to overcome these barriers and discover phenotypes in two sources of patient data. First, we
analyze data from the 2010 Healthcare Cost and Utilization Project National Inpatient Sample (NIS), which contains
upwards of 8 million hospitalization records consisting of administrative codes and demographic information. Second,
we analyze a small (N=1746), local dataset documenting the clinical progression of autism spectrum disorder patients using granular features from the electronic health record, including text from physician notes. We demonstrate that
low rank modeling successfully captures known and putative phenotypes in these vastly different datasets
- …